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^SJ ' This paper addresses the problem of estimating a convex regres- 

sion function under both the sup-norm risk and the pointwise risk 
. using B-splines. The presence of the convex constraint comphcates 

(N ■ various issues in asymptotic analysis, particularly uniform conver- 

5-H , gence analysis. To overcome this difficulty, we establish the uniform 

Lipschitz property of optimal spline coefficients in the i'oo-norm by 
' exploiting piecewise linear and polyhedral theory. Based upon this 

property, it is shown that this estimator attains optimal rates of con- 
vergence on the entire interval of interest over the Holder class under 
both the risks. In addition, adaptive estimates are constructed under 
both the sup-norm risk and the pointwise risk when the exponent of 
' the Holder class is between one and two. These estimates achieve a 

^0 , maximal risk within a constant factor of the minimax risk over the 

^ ■ Holder class. 

» 

1. Introduction. Consider the convex regression problem of the form 
(1.1) Vk = f{xk) + aek, k = l,...,n, 

' where / : [0, 1] — t- M is a convex function, the are independent, standard normal errors, 

^ ■ Xi = i/n,i = 1, . . . ,n are the design points. Let 

p 

in 
o 

(N 



- T— I 



C = {j:[0,l]^R /fa) - < - /fa) if 

I y — X z — y J 



be the collection of convex functions on [0, 1]. The goal of this paper is to estimate f € C and 
analyze the performance of the estimate under both the sup-norm risk and the pointwise 
risk. 

The shape restricted inference finds a wide range of applications, and receives fast growing 
interest in diverse areas. Examples include reliability (survival functions, hazard functions), 
medicine (dose-response curve), finance (option price and delivery price), and astronomy 
(mass functions). Much effort has focused on monotone estimation via the least squares 
approach (i.e., Brunk's estimator) [1, 5, 30, 34]. For convex or concave regression, the 
least squares estimator was originally proposed in [17] and its asymptotic properties have 
been studied by [11, 14, 16, 26]. However, the least squares estimators suffer several major 
deficiencies: (i) they lack smoothness; (ii) they have a non-normal asymptotic distribution 
[14, 45] with low convergence rates (e.g., of order n^^^ for the Brunk's estimator) regardless 
of the smoothness of the true function; and (iii) they are inconsistent at boundary and have 
a non- negligible asymptotic bias [44]. 
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Other estimation procedures have also been developed for shape restricted inference. For 
instance, Mammen and Thomas- Agnan [27] studied constrained smoothing splines, but their 
computation is highly complicated; see [38] for a related result via control theoretic splines. A 
two-step estimator was proposed in [3] : it isotonizes a derivative estimator and then obtains 
a convex one by integrating the monotone derivative. Meyer [29] developed an algorithm for 
cubic monotone estimation with an extension to convex constraints and other variants, e.g., 
increasing-concave constraints. A penalized monotone B-spline estimator was treated in [37]; 
its asymptotic behaviors were analyzed. Additional results include [15, 30, 32, 43], just to 
name a few. In spite of the above mentioned progress, many critical questions remain open in 
convex regression and its asymptotic analysis, especially those related to adaptive estimation 
over a function class. One of bottle-neck difficulties in adaptive asymptotic analysis is largely 
due to the lack of uniform convergence properties of an estimator when a shape constraint 
is imposed. 

In this paper, we consider estimation of a convex function in the Holder class. Let 
denote the Holder class 



where 7 = r — ^ S (0, 1]. Let C//(r, L) = C D be the collection of functions in both C and 
i/£. Since a convex function on [0, 1] must be Lipschitz continuous, i.e., 7 = 1 and ^ = 0, we 
have r > 1 for any / G C(r, L). It is well known that, for a fixed r, there exists an estimator, 
depending on r, which achieves the optimal rate of convergence in i/£ [40]. For example, 
the minimax sup-norm risk on i?£ has an asymptotic order given by 



where a x 6 means that a/b is bounded by two positive constants from below and above. 
However, the existence of an adaptive estimator (independent of r) that achieves the conver- 
gence rate in (1.2) uniformly over r is more subtle. When the sup-norm risk is considered, 
a series of papers, e.g., [2, 9, 20, 24], have shown that the kernel estimator can be used to 
construct such an adaptive estimator. On the other hand, when the pointwise risk is con- 
sidered, a full adaptive procedure achieving (1.2) does not exist and a logarithmic penalty 
term must occur [4, 21]. Specifically, for any xq S (0, 1), there exists a positive constant tti 
such that 



Other approaches for pointwise adaptive estimation are reported in [23, 41], where a similar 
phenomenon occurs. For general discussions of adaptive methods for unconstrained func- 
tions, see [31, 42] and the references therein. 

When a shape constraint is imposed, it was firstly noted in [19] that it does not improve 
the optimal rate of convergence. Further, it was found in [25] that the extra difference 
order constraint completely changes the adaptive estimation problem. In particular, Low 
and Kang [25] proposed a pointwise rate adaptive procedure for monotone estimation in the 
minimax sense with respect to a Lipschitz parameter. Unfortunately, when this procedure is 



Hi := {/ : If^'Kx,) - /W(X2)1 < L\x, - X2\\ Vxi,X2 G [0, 1]} 



(1.2) 




(1.3) 
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applied to an interval of fixed points, it does not yield a monotone function as an estimate. 
An adaptive monotone estimation procedure is given in [6] , which studied the least squares 
estimator and showed that the attained rate of the probabilistic error is uniformly over a 
shrinking L2-neighborhood of the true function. Other related papers on adaptive convex 
estimation include [11]. 

The present paper proposes a B-spline estimator with an arbitrary spline degree for con- 
vex regression. The convex shape constraint of an estimator is converted into the similar 
constraint on spline coefficients. In addition to its conceptual simplicity and numerical ef- 
ficiency, the obtained B-spline estimator is globally convex, smooth by choosing a suitable 
spline degree, and attains boundary consistency (as well as at the interior) by selecting a 
proper number of spline bases. The major part of the paper is devoted to adaptive asymp- 
totic analysis of the B-spline estimator on Cnir, L) under both the sup-norm and pointwise 
risks. Toward this end, it is essential and critical to establish certain uniform convergence 
properties of the B-spline estimator. However, challenging issues arise due to the presence 
of constraints. For example, the closed form of optimal spline solutions does not exist in 
general. Instead, they are characterized by complementarity conditions [13, 36] that give 
rise to a nonsmooth piecewise linear function of observation data. Due to the nonsmooth 
and combinatorial nature of complementarity problems, a thorough understanding of com- 
plementarity conditions and the associated piecewise linear function is far from trivial. In 
this paper, we exploit optimization techniques, along with adaptive asymptotic statistical 
tools, to tackle these problems. The major contributions of the paper are: 

1. As a key technical contribution of the paper, we establish the uniform Lipschitz prop- 
erty of optimal spline coefficients with respect to the £oo-norm via piecewise linear and 
polyhedral theory (cf. Theorem 3.1). Unlike the conventional and generic Lipschitz property 
in the ^2-iiorm (which is trivial to show), the attained Lipschitz property in the ^oo-norm 
requires a nontrivial argument that takes full advantage of the convex shape constraint. 
It yields a uniform sup-norm bound on variations of spline coefficients regardless of the 
number of spline bases, leading to more precise and less conservative error estimates in 
uniform convergence analysis. This property paves the way for asymptotic analysis (e.g., cf. 
Propositions 7.1-7.3) and construction of adaptive procedures. 

2. By exploring the uniform Lipschitz property, we obtain the following results in adaptive 
asymptotic analysis: 

(2.1) For a fixed order r, the proposed B-spline estimator achieves an optimal minimax 
rate of convergence on C//(r, L) under both the sup- norm and pointwise risks (cf. Sec- 
tion 3.1). This result gives rise to an optimal choice of the number of spline bases. Un- 
like the widely studied least squared convex estimator, the B-spline estimator achieves 
optimal convergence rates on the entire interval [0, 1] under both the sup-norm and 
pointwise risks (cf. Theorem 3.2), thus leading to uniform consistency on [0, 1]. 

(2.2) Adaptive estimators are constructed under both the sup-norm and pointwise risks over 
C//(r, L) with r S [1,2]. These estimates achieve a maximum risk within a constant 
factor of the minimax risk over the Holder class (cf. Section 3.2). In particular, the 
pointwise adaptive estimator attains convexity on the interval [0, 1] as well as the 
minimax risk over an entire range of values of r G [1, 2] and L. 

(2.3) A brief discussion on variance estimation is given in Section 3.3. 

The paper is organized as follows. Section 2 formulates the B-spline convex estimator 
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and develops optimality conditions for spline coefficients. The main results of the paper 
are presented in Section 3, including the uniform Lipschitz property and its implications 
in adaptive asymptotic analysis. Potential extensions and future research directions are 
discussed in Section 4. The technical proofs of the main results are given in Sections 5-10. 

2. Formulation and Optimality Conditions. Denote the pth. degree B-spline basis 
with knots = kq < ki < ■ ■ ■ < kk„ = 1 by {-B^f^ : k = 1, . . . ,Kn + p}- For simplicity, 
we consider equally spaced knots, namely, ki = l/Kn, K2 = 2/Kn, . . . , kk„ = 1- The value 
of Kn will depend upon n as discussed below. Assume that n/Kn is an integer denoted by 
M„. We consider the following convex spline estimator: 

k=l 

where the spline coefficients b = {bk,k = 1, . . . , Kn + p} minimize 

n K„+p 

(2.1) E(y^- E bkBt\xi) 

i=l k=l 

subject to the convex constraint A^b > 0, where A is the backward difference operator such 
that Abk = bk - fefc^i and = A A. 

Let the n x (Kn + p) design matrix X = [-^[^^(xj)]^. ^ and denote /3„ = Y17=i i^k'^^i))^ 
for k = p+1, . . . , Kn- Given a spline degree p, {f3n^) converges to a positive constant (de- 
pending on p only) as {n/K„) — )• oo. Thus there exists a positive constant Cjs^p (depending 
on p only) such that 

71 

(2.2) (]^>Cp^p-—, yn,Kn. 

Define the positive definite matrix Ap := X^X//3„ G ]R{^n+p)x{ir„+p) y ._ x^y/(3n, 
where y = {yi, . . . , yn)'^ (we drop the subscript p in Ap for notational simplicity). It is easy 
to verify that for a given spline degree p, A is a {2p + l)-banded matrix. For instance, when 
p = 1, A is tridiagonal. The convex constraint on spline coefficients is characterized by the 
following polyhedral cone 

n:={be M^"+P : bk - 2bk+i + 6fc+2 > 0, = 1, . . . , + p - 2} . 

When the knots are equally spaced, it is easy to verify that if the B-spline coefficient vector 
IS m n, then /W is a convex function. Formulating (2.1) in matrix notation, the underlying 
optimization problem becomes the following equivalent constrained quadratic program 

(2.3) b = arg min - b^A b — b^y. 

ben 2 

We first give the characterization of optimality conditions for b. The conditions are rep- 
resented by complementarity conditions, which plays a crucial role in addressing analytic 
and statistical properties of the estimator. We provide a short introduction of the comple- 
mentarity condition. Two vectors u = (ui, . . . , u^)'^ and v = (wi, . . . , Vd)'^ in are said to 
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satisfy the complementarity condition [7] if Ui > 0, Vi > 0, and UiVi = for alH = 1, • • • ,d. 
This condition can be put in a more compact vector form: < u _L v > 0, where u -L v 
means that the two vectors are orthogonal, i.e., u'^v = 0. 
We introduce additional notation. Let 
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and let D2 G m(^'"+p 2)x(A'„+p) ^j^g 2nd-order difference matrix such that = 
[A'^ibs), ■■■ , A2(6x„+p)]'^; see (5.3) for the explicit form of D2. 

Theorem 2.1. Let Cdm denote the dth row ofC. The necessary and sufficient conditions 
for b £ Q to minimize (2.3) are 



(2.4) 



< ^ C^,C{kb-y) > 0, 



(2.5) q^^+p). (A& -y)= C^K^+p). C{Ab -y)=0, 
where the index set 7 := {1, . . . , Kn + p — 2}. 

2.1. Piecewise Linear Formulation of Optimal Spline Coefficients. It follows from The- 
orem 2.1 that b(y) is characterized by the mixed complementarity conditions. It is known 
from complementarity and polyhedral theory that b{y) is a continuous piecewise linear func- 
tion of y determined by an index set a = { i | {D2b)i = 0} C {1, . . . , Kn + p — 2} (q may 
be empty). Indeed, b has 2(^"+p~^) linear selection functions, each of which is denoted by 
6" corresponding to the index set a. Hence, the solution mapping y 1— )• 5 is a (continu- 
ous) piecewise linear function with 2(^"+P~2) selection functions. The following proposition 
characterizes each linear selection function; its construction and proof is given in Section 5.2. 

Proposition 2.1. For each index set a C {1, . . . , K„ -|- p - 2}, let i := Kn + p - \a\. 
Then there exists a row independent matrix Fa € M^^^^^+P^ such that the linear selection 
function b" is given by 

r{y) = Fl{FahFl)-^Fay. 

In view of the above proposition and its construction (cf. Section 5.2), a linear selection 
function corresponds to an index set a depending on y (or y by somewhat abusing notation) . 
Consequently, the piecewise linear function b can be written as 

Let N{x) := ^B^\x), . . . , B^^j^p{x)~^'^ . For a given y, the convex B-spline estimator becomes 

(2.6) /W(^) = N^{x)Ky) = if]Ar^(x)F4)(F,(,)^F4))^'F,(,)iV(x.)y.. 

i=l 
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Denote the weight function in (2.6) by Ka{s,t), i.e., 

K^iy)is,t) = A^^(^)i^Jfa)fcfa) ^^^'^^^'^ <,))"'i^.(.)A^(t)- 

Hence, the convex spUne estimator is a kernel estimator. However, the kernel depends on 
the index set a, which in turn relies on the observation y. Therefore, the estimator is not a 
linear but a piecewise linear function in y. 

3. Main Results. In this section, we exploit the piecewise linear formulation of b 
to establish the uniform Lipschitz property of b in the £oo-norm. Roughly speaking, this 
property says that b(y) is a Lipschitz function of y with a uniform Lipschitz constant (with 
respect to the £oo-norm), regardless of Kn and a. This property is critical in establishing 
uniform consistency and developing adaptive estimators. Formally, this property is stated 
in the following theorem whose proof is given in Section 6. 

Theorem 3.1. Given a spline degree p. There exists a positive constant Coo,p (dependent 
on p only) such that 

(1) for any Kn and any index set a, ||fJ(FcAFJ)^-'^Fq,||^ < Coo,p; 

(2) for any Kn, \\b{u) -b{v)\\^ < Coo,p||n - u||oo, Vu,wGM^"+p. 

In the next, we apply the uniform Lipschitz property to derive optimal rates of conver- 
gence in Section 3.1, construct adaptive estimators under both sup-norm risk and pointwise 
risk in Section 3.2, and study variance estimation in Section 3.3. 

3.1. Optimal Rate of Convergence. For any / S Cnir, L) with r > 1, we write := /[^^ 
when using the spline degree p = [r — 1] to fit the data. If r = 1, then /(^) := /t^l with 
p = 1, namely, /(r) is a piecewise linear spline. In the following, for a function g : [0, 1] — )■ M, 
let \\g\\oo ■■= suptg[o,i] \g{t)\. 

Theorem 3.2. Assume f £ Cnir, L). Then, 
(1) If Kn is chosen as 

T 2 1 
I ^\ 2r + l I \ 2r + l 

K, 



) Vlogn/ 

then there exists a positive constant C\r dependent only on r such that 



n 



(3.1) sup Wfi^r)- f Woo] <CirL— a— ^' ^ 

feCH{r,L) ^ ' 

(2) For any xq G [0, 1], if Kn is chosen as 



2r+l 



then there exists a positive constant C2r dependent only on r such that 
(3.2) sup e(|/(,)(xo)-/(xo)|') <C'2rL^a^n^. 

/6C//(r,L) ^ ^ 
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It is known that the maximum hkehhood estimate of a convex function is inconsistent at 

the boundary, which is cahed the spiking problem [44]. In contrast, Theorem 3.2 shows that 

1 

/(^) is uniformly consistent on [0,1]. The optimal choice of is of order (n/ log n) 

^ r 

and — /lloo achieves the optimal rate of convergence, which is of order (nj log n) (2''+i) 

1 

[31]. Under the pointwise risk, the optimal choice of is of order and the estimator 

2r 

thus achieves the optimal rate of convergence, which is of order nt^'^+i) [40]. 

The next result shows that, for any / € C//(r, L) with r > 2, the constrained spline 
estimator and the unconstrained spline estimator coincide with probability tending to one, 
provided that fix) > for aU x G [0, 1]. 

Theorem 3.3. Assume f G Cnir^L) with r > 2 and f"{x) > c > for all x £ [0, 1]. 
Let /"'^ be the unconstrained regression spline estimator. If n~^K^logn — ?• and Kn — ?• oo 
as n —)• oo, then, 

P{r{x) = f{x), VxG [0,1]) ^1. 

Proof. Zhou et al. [47] studied the problem of estimating derivatives of a regression 
function using the corresponding derivatives of regression splines without shape constraint. 
For any x G [k^, /tfc+i], A: = 0, . . . , - 1, if £ > 3, 

^{^r{x)) - fix) = b{x) + o{K-^+'), 

where 

is of order 0{K~^~^'^), and -Bm.(") is the mth Bernoulli polynomial inductively defined as 
follows: 



Bo{x) = l, Bi{x)= f iB,^i{z)dz + bi 

Jo 



where 6j = —i Bi^i{z)dzdx is the ith Bernoulli number. The variance of -^f^'^{x) is 

of order n^^K^ (cf. [47, Lemma 5.4]). Similar to Lemma 8.1 given in Section 8, it can be 
shown that 

^^"^"^^ " ^^^^^"^^^C = 0,[y/n-^Kl\ogn). 

Therefore, if n~^K^logn — > and Kn — > oo as n — ^ oo, jj^/"'^ — /"jjoo = Op{l). Hence, 
the unconstrained and constrained estimators are asymptotically equivalent. □ 

3.2. Adaptive Estimation. In this section, we construct adaptive estimators, with respect 
to both the sup-norm risk and thepointwise risk . These estimates have maximum risks 
within a constant factor of the minimax risk over Ch (r, L) . We will focus on the function class 
Cnif^L) with 1 < r < 2, where the differences between the constrained and unconstrained 
estimate do no vanish asymptotically. 
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3.2.1. Adaptive Estimation under Sup-norm Risks. It follows from Propositions 7.1 and 
7.2 that, for any r £ [1, 2], the bounds for the bias and the stochastic term, respectively, are 

(3.3) sup ||/m-/||oo < CiLK-', 

feC„{r,L) 

(3.4) P(||/V)-/(r)l|oo>n) < (Jf„ + l)exp{-^^^^ 

n 2 



where Ci and C2 are two positive constants independent of r G [1,2]. Hence the optimal 
number of knots is 



2 

(3.5) K(.)=f ^— 1 

^ ' ^' \CirV2(2r + 1) J \ n ) 

and the optimal rate of convergence is 



(3.6) V'M = C,LK-;^+^,^^C2<y 



2 „ / K(j.^ log n 



n 



2r 
2r + l 



V2(2r + 1) 



n 



Given n, let := [(logn)^/^], and rj := 1 + j/Tn, j = 0,1,..., t„ be the elements in 
[1, 2]. We consider the adaptive estimator using the idea of Lepski [22]. Let 



A; = sup jo < A; < r„ : ||/(.^^) - /(,.j.)||oo < V'(rj), for any j < a|. 

Define f := r^. We use f(^f^ for estimation in the sup- norm distance. 



Theorem 3.4. The estimator f(^f^ is a rate adaptive estimator on Ch{t,L) for the sup- 
norm distance, i.e., there exists a positive constant 7r2 such that 

r " -I 1 2r / log n 

sup sup E{||/(f.) - /llooj < TT2 + 1 

re[l,2]/eC^f(r,L) ^ ^ 

3.2.2. Pointwise Adaptive Estimation. In this section, we construct an estimator which 
attains the minimax rate of convergence for a whole range of values of r € [1, 2] and L. In 
the context of convex regression, unlike the earlier work on pointwise adaptive estimation, 
a fully adaptive procedure can be obtained. 

We explore the idea of Low and Kang [25] to construct an adaptive estimate of /(xq) for 
any given xq £ (0, 1). Given the observation data (yi)"=i, let 

(3.7) y,:= S.^V-H"^-^ < '^^) 

Li=l I{l^k-l <Xi< Kk) 

Then 6 = (61, ... , hK„y minimizes '}2,k=ii^k — yk)^ subject to the convex constraint A^6fc > 
0, A = 3, . . . , Kn- This indicates that a piecewise constant spline with p = is used to fit 
the data in (2.1). Recall that = n/Kn- Let 

Mn + l- 



4 := i 

n 
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Hence, (^k is the average of the design points on (k^-i, k^]. Let / denote the piecewise linear 
function which interpolates {Ck, b^), k = 1, . . . , Kn- 

Fix xq e (0,1). For each n, let G N satisfy Cd„ < < C,d„+i- Let := 
where depends on j. Further, we let y^ j denote the defined in (3.7) corresponding 
to a given i^nji and let fj be the estimator / corresponding to K^j- Fix a real number 
A > such that P[Z > A) < 1/4, where Z is a standard normal random variable. Set 

Ij := l(^Ayd„+4,j - Ayd„-2,i < X^i^^n'^a^ I (^Ayd„+4,i - Ayd„-2,i > A2i+in~t(7). 

Note that exactly one Ij ^ and thus the collection {Ij} provides a selection procedure for 
Knj- The adaptive estimator is given by 

oo 

(3.8) f{xo) = Y,fj{xo)Ij. 

i=i 

Theorem 3.5. The estimator in (3.8) is a rate adaptive estimator under the pointwise 
risk, i.e., for any xq G (0, 1), there exists a positive constant vrs such that 

(3.9) sup sup E|7(j;o) - /(2;o)P < VTsLT^o-T^nP^. 

re[l,2] f&CH{r,L) 

3.3. Variance Estimation. In practice, the variance cr^ is replaced by the estimated vari- 
ance (7^ in the above adaptive procedures. We will briefly study the asymptotic properties of 
the maximum likelihood estimator of cj^. Given the observation data y = (yi, . . . , y„,)"^ G K"" 
at design points x = (xi, . . . , x„)^ G M", let fy := (/'^'(xi), . . . , /f^^(x„)) with p = \r — 1] 
and f := (/(xi), . . . , /(x„,))^. Let a{y) be an index set corresponding to the optimal coef- 
ficient b{X'^y/Pri) defined in Section 2.1. Then for fixed Kn and p, we have fy = Aa(^y)y, 
where 

(3.10) A^^y) = XF^^y^ (^Fa(^y)X^XF^(^y^^ ^aiy)^^ ^ M"""^. 

It follows from the similar discussion as in Section 2.1 that fy : M" — t- M" is a continuous 
piecewise linear function, where each linear selection function is defined by ^Q(y) • The MLE 

of o-^ is (T^ = ||y - Aa(y)y\\l/n. 

Theorem 3.6. Assume f G C(r, L), Kn — )■ oo as n ^ oo, and let p = [r — 1] . If Kn = 
o{n), then a"^ — )■ cr^ in probability, and if Kn = o{y/n), then ^Jn{cP' — a^) is asymptotically 

normal with mean zero and variance 2a . Furthermore, if Kn is of order n^j'+i , then |IE((T — 
a^)\ = 0{n^). 

Variance estimation based on the differences of successive points has been studied orig- 
inally by Rice [33]. Compared to this estimation and other generalizations, the MLE o"^ 
has a smaller asymptotic variance but a slightly larger bias. Meyer and Woodroofe [28] 
studied the bias reduction variance estimator for a monotone regression model using the 
least squares method. The bias reduction variance estimator for a convex spline model is 
nontrivial and shall be addressed in a future paper. 
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4. Discussion. We have considered the B-sphne estimators for convex regression in 
this paper. Tlie proposed estimator and asymptotic analysis techniques can be extended 
to other shape restricted inference problems. For example, it is known that the uniform 
Lipschitz property (in the foo-norm) holds for the monotone constraint [37, 43]. Therefore 
the minimax optimal convergence rates and adaptive estimators can be established in a 
similar manner. It is conjectured that the uniform Lipschitz property holds for a higher 
order difference constraint. However, its development is much more involved and shall be 
reported in the future. 

We have provided optimal rate adaptive estimators for convex regression under both 
the sup- norm and the pointwise risks. Nonetheless, the question of explicit construction 
of asymptotically exact adaptive estimation over the Holder classes remains open. Other 
interesting research directions include adaptive confidence bands and hypothesis tests in 
convex estimation [10, 12]. 

5. Proofs for Section 2. 

5.1. Proof of Theorem 2.1. 

Proof of Theorem 2.1. Write (2.3) as miubgn 5(6), where the objective function 5(6) := 
-^h^ Kb — b"^ y . It is clear that g is coercive on 

I^i^n+P and strictly convex on the closed convex 
set $7. This ensures the existence and uniqueness of an optimal solution. Furthermore, since 
i7 is a polyhedral cone, it is finitely generated by {v^ ^ —v^,v^, —v'^,v'^,v'^, . . . , v^"~^^}. Here, 
for each k = 3, . . . , + p, 

(fc-i)-copies 
and for A; = 1, 2, 



0,...,0, l,2,...,i^„+p-A; + l)^ 
(fc-1) -copies 



(5.1) = ( 1, 0, -1, -2 



It is easy to see that A f | 



,-(if„+p-2) j , v' = (^0, 1, 2, 3,...,i^„+p- 1 

2„,fc _ Q fQj. /(. = 1^2 and all j > 2. Hence ±v'' G for = 1,2, 
and it can be also verified that X]fc=i "^^ ~ Further, any 6 = (61, ... , bK„+p)^ £ ^ can be 
positively generated as 

2 Kn+P 

6 = J]](max(0,6i)^^* + max(0,-6i)(-^^*)) + Yl ^^(^^K- 

i=l 1=3 

By using these generators for fi, we obtain the following necessary and sufficient optimality 
conditions for an optimizer b: 

(5.2) < D2b ± CVg{b) > and {v'' , Vg{b)) =0, V A; = 1,2, 

where D2 G M(^n-2+p)x(x„+p) gj^g^ by 



(5.3) 



Do 



1 -2 
1 



1 
-2 1 












1-210 

1-21 
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and C G 



„-2+p)x(iC„+p) 



is given by 



C 



12 

1 







{Kn+p-4:) {Kn+p-3) {Kn+p-2) 



1 




It can be shown via the definitions of and v'^ in (5.1) that the second optimahty condition 
in (5.2) can be equivalently written as 

Kn+p Kn+p 

Y^{yg{h)). = Q and ^ (J^, + p - i + 1) (V5(6)), = 0, 

i=l i=l 

where 'Vg{b) = Ab — y. This gives rise to the two boundary conditions. Moreover, noting 
that for any k, the definitions of and in (5.1) yield 



i=i j=i 

we obtain the equivalent condition for the first optimality condition in (5.2) 
(5.4) < D2b ± {C')Vg{b) > 0, 



where J = {I, Kn + p - 2}. By (C^ 



C^,C, the proof is complete. 



□ 



5.2. Construction and Proof for Proposition 2.1. We first construct certain equations 
that yield a linear selection function corresponding to the index set index set a = {i \ [D2b)i = 
0} C {1, . . . , Kn +p — 2} [a may be empty). Specifically, for the given b and a, we define a 
vector 6" and an associated family of index sets {/3f } in the following steps: 

(1) let (.1 := mm3<i<K„+p{i : A,'^{bi) = 0}, and Ii := maxi^<k<K„+p{k : A'^{bi) = 
0, Vi = ^1, . . . , k}. Then inductively define, for j > 1, 

ij+i := min {i : A\bi)=0}, 
i+ej <i<K„ 

ij+i := max {k : A^(6j) = 0, Vi = ij+i, • • • , k}. 

ij+i<k<K„ 

Suppose that we obtain q^s such £i,£i, namely, £i, . . . ,ig and Ji, . . . J-q. Define := 

{i : £j — 2 < i < Jj} for j = 1, . . . ,q. Note that |/3" | > 3 for each ij, and for two 

consecutive index sets, £j^i > ij+2. Thus if the equality holds, then = {ij}; 

otherwise, the two consecutive index sets are disjoint. 

(2) let L := Kn +p + q — \ U^^^ (3f_ \ , where | • | denotes the cardinality of an index set. For 

each i e {1, . . . , Kn + p} \ ^1=1^1- > define /3° = {i}, where s = (g + 1), . . . , L. 
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(3) this step arranges the index sets Pf, in a monotone order as follows. For each let 
min(/3") denote the least element in /3" (the similar notation will be used for max 
below). Define Isi ■= a'rgmin^^^,,,^^_{min(/3^)}. Let /3f := /?£ . Then inductively define 
for each j > 1, Pf-^-i ■= Pf^. i where 

(4) in this step, we regroup the index sets /3°. in a way that preserves desired structural 
properties to be used in the subsequent development. Define po := and 

pi := max ( 1, max{A; > I : n / 0, Vj = 1, . . . , A; - 1} ) , 

and /3f := U^Li/5°, the companion index set t?i := {min(/3°),Vj = U 
{max(/3pj}. Recursively, define, for each s > 1, 

Ps+i := max {p., + 1, max{fc > Ps + I : ^jD / 0, Vj = + 1, . . . , A: - 1} ) , 

and := U^Ly^_|_i/5°, the companion index set i^s+i := {min(/3"),Vj = Ps + 

l,...,Ps+i} U {max(;8p^_|_ J}. Without loss of generality, we assume that the index 
elements of each "ds are in the strictly increasing order. Hence, any two consecutive 
index sets in "dg correspond to £j and Ij defined in Step (1) with ij^i = Jj. 

(5) suppose that there are L such the index sets i^g, and let ?? := Ug^i'&s whose index 
elements are in the strictly increasing order. Then /3" := (/3j), where i £ ^. 

It is clear from the above construction that {/3,f } forms a finite and disjoint partition of 
{l,...,Kn+p}, namely, Uf=i /3f = {1, . . . , i^n + p} and n = whenever j / k. 

For a given index set a, we drop the sign restriction (i.e., the inequality > 0) in 
(2.4) and obtain its corresponding linear selection function from the following (possibly 
redundant) equations: 

(5.5a) (L»2S)„ = 0, 

(5.5b) D2b ± C^,C{Ab-y), 

(5.5c) Ca,C{Ab-y) = 0, 

(5.5d) C(^K„+p). (A6 -y) = C(^k„+p). C {Ab - y) = 0, 

where a := {1, . . . , Kn + p — 2}\a. Indeed, we shall use equations (5.5a), (5.5b), and (5.5d) 
to characterize a linear piece. Let 6" denote the vector constituting the free variables of 
equation (5.5a). With this construction and notation, we are ready to prove Proposition 2.1 
as follows. 

Proof of Proposition 2.1. We introduce some notation first. Let mf := |/3f | and 
hf := mf — 1, where i = 1, . . . , L. Note that if mf > 1, then mf > 3 such that hf > 2 and 
> 2. It follows from the definition of /3f that 6" = {Fa)'^b'^, where the matrix 

Fa,l 



(5.6) 



G 
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and each matrix block corresponding to is given as follows: if = 1, then 



a,k 



1; 



otherwise, assuming that the index elements in 'df. in Step (5) above are in the strictly 
increasing order without loss of generality, and letting /i^^- := + 1) — i^kU) ^ 2 for each 
J = 1, . . . , — 1, we determine F^^k £ rI'^'^I^™''? from constructed in Steps (l)-(5) as 



(5.7) 









1 







Y^a,k,2 
Y^a,k,2 






Y^a,k,3 



where Wk ■= I'&kl — 1) and the row vectors 



]^a,k,j 



, j = l,...,U;fc, 



For notational simplicity, let v := Ab — y. In view of the complementarity condition in 
(2.4), we have {D2bfC^,Cv = 0. Since b = (Fa)^6", (6")^Fa(L>JC^,Cu) = 0. Moreover, it 
can be further verified that 



D<2 Cy^C 



lKn+p-2 

E 







(is:„+p-2)x2 



02 



x2 



where 



E 



-{Kn+p-l) 
Kn+p-2 



-{Kn+p-2) 
Kn+p-3 



^2x(X„+p-2) 



It also follows from the boundary conditions C(^K^j^p),v = Cf^Kn+p)*^'^ — ™d elementary 
row operations that [—E l2]v = 0. Therefore, we obtain Z^I'C^.Ct; = I(^Kn+p) v = v. Hence, 
(6")^Fa(DjC7^.C?;) = [b'^ fFaV = 0. Recall that for the given index set a, 6" corresponds 
to the free variables of the equation (5.5a). Hence, 6° is arbitrary such that FaV = 0. This 
leads to 

F^A{Fafb^ = F^y. 

Letting A" = Fa A{Fa)'^ and y°' = Fa y, we obtain the linear equation for 6". Since Fa is 
of full row rank and A is positive definite, A° is positive definite and hence is invertible. 
Consequently, we have 6"(y) = Fj6"(y) = Fj(F„AFj)"^F„y. □ 

6. Proof of Theorem 3.1. We divide the proof of Theorem 3.1 into several steps. 
We first establish a result pertaining to F^Fa ■ 

Lemma 6.1. For any and a, F^F^ is a strictly diagonally dominant, nonnegative, 
tridiagonal matrix. 
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Proof. Recall that I := Kn +p— |a|. For notational simplicity, let G := FaF^ . First of 
all, it is easy to verify via (5.6) and (5.7) that G is the i x i tri-diagonal matrix given by 



Vi • • • 











d{e--i){e-i) r]e-i 
Ve-i dee . 



The entries on the three diagonal bands are determined as follows. Consider Fa in (5.6) 
with L blocks. Fix E {1, . . . , L}. If = 1, then Fa,kF^f, is a real number that appears 
on the diagonal of G. Denoting this number by dss, we have dss = Fa^kF^f. = 1 and 
0, Gsj = for ah j < s - 2 and j > s + 2.1fm? > 1, then F^ kF^k is a 



Gs{s+1) — G(^s+l)s 

symmetric, positive definite matrix of order that forms a diagonal block of G. Making 
use of the structure of F^^k given in the proof of Proposition 2.1 and somewhat lengthy 
computation, we obtain the following results in two separate cases (recalling Wk '■= — 1): 

(1) k = I or k = L. For k = I, 

{hi, - l){2hl, - 1) 



d 



11 



1 + 



1,1 



G 



s(s+l) 



G 



(s+l)s 



1 



_+l ^ 2{hl 



= + 1 



d 



(U>1+1)(«>1+1) 



(/iU+l)(2/iU+l) 



V s = 1, 



V s 



,wi, 



,Wl, 



6hf 



Wl 



Besides, G( 



'(wi+i){wi+2) = G{wi+2){wi+i) = and for each s = l,...,wt, Gsj = 0, Vj > 
s + 2 and j < s — 2. For k = L, the similar results can be established by using the 
symmetry of the rows of F^^l- 
(2) A; G {2, . . . , L — 1}. In this case, suppose that the (1, l)-element of F^^kFak^ which is 
a diagonal entry of G, is denoted by du- Then we have 



dtt 



Vt+s 



1 + 



{hl,-l){2hl,-l) 



kA 



G 



d, 



{t+s){t+s) 



(t+s){t+s+l) 



6h' 



k,s 



+ 



k,s) 



+ 1 



V s = 1, 



V s = 1, 



,Wk, 



,Wk-l, 



d 



In addition, for each s = t, . . . ,t + Wk + 1, Ggj = for all j < s — 2 and j > s + 2 



and G 



t{t-i) 



G 



(t+u.fc+l)(t+«'fe+2) 



0. 
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Due to Gt{t-i) = and the symmetry of G, we further deduce that if a diagonal entry 
dtt = Gtt with t > 2 corresponds to a scalar F^^kFak (^■^•' "^fc ~ ■*-)' then = 0. 

(Recall that = has been shown before.) Similarly, if du is the first diagonal entry 

of Fa,kFli„ then G(i_i)j = 0. 

In the next, we show that G is strictly diagonally dominant. For a given G S M^^^, define 

6 := dii - |r?i|, S,e := dg - {rje^il, and ■= da - \rji-i\ - \rji\, Vi G {2, 1}. 

In light of the entries of G obtained above, we have, for each k G {!,••• ,L}, 

(1.1) if = 1, then £,i = 1. 

(1.2) if > 1 with k = 1, then (i) the corresponding = dn — \r]\ > -\ — g^; (ii) for 
s = 2,... ,wi,the corresponding = dss-\Gs(s-i)\-\Gs{s+i)\ > (^i,s-i+/ii,s)/6; and 

(iii) the corresponding = - |G(^l+l)^„J - |G(^i+i){^i+2)l > | + 

The similar results can be obtained for > 1 with k = L using symmetry. 

(1.3) if > 1 with k G {2, . . . , L — 1}, then (i) the corresponding = du — — 

\Gt(t+i)\ > h + ~6^; (ii) for s = 1, . . . j-Wfc - 1, the corresponding = d(^t+s){t+s) - 
|G(j+,)(t+s_i)| - \G^t+s)it+s+i)\ > {K,s + ^fc,s+i)/6; and (ih) the corresponding ii = 

1 ^fe 

^(i+«>fc)(i+«>fc) - \G(t+Wk)(t+Wk-l) \ ~ \G[t+Wk)(t+Wk+l)\ ^ 2 

Consequently, > for all i and G is strictly diagonally dominant. □ 

For the given Fq, define rji as the sum of the entries in the ith row oi Fa, i = 1, . . . ,i. 
We have: 

(i) if = 1, then rji = 1. 

(ii) if > 1 with k = 1, then (i) for s = 1, the corresponding rji = — (ii) for 
s = 2,...,wi, the corresponding rji = (/i"^_^ + /i°^)/2; and (iii) for s = Wi + 1, 

the corresponding r/j = g"^- The similar results can be obtained for > 1 with 

k = L using symmetry. 

(iii) if > 1 with k G {2, . . . ,L — 1}, then (i) the corresponding r^j = — (ii) for 
s = l,...,u)fc — 1, the corresponding r/j = (/i^ s + ^fc s+i) /2; and (iii) the corresponding 

??i = — r^- 

Hence, each rji > 0. Define the diagonal matrix 

(6.1) S„ := diag(^?/f\ . . . ,r/^~^^. 

The next lemma shows the equivalence of the (absolute) row sum of FaF^ and that of 

Fa- 

Lemma 6.2. For any Kn and a, rjj = Ylk=i{^aFa)jk for each j = 1, . . . ,i. 
Proof. Let {Fa)j, denote the jth row of Fa- Then 

Y,\iFaF^)jk\ = {{Fa)j.,Y.{{Fa)k.f) = {{Fa)j.,l) = Y.(Fa)jk = Vj, 

k=l k=l k=l 
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where we use the fact that the sum of aU rows of is 1 := (1, . . . , 1). □ 

Proposition 6.1. For any Kn and a, the following statements hold: 

(1) the eigenvalues ofEaFaAF^ and EaFaF^ are all positive reals; 

(2) X^in{A)Xmin{^aFaF^) < Ai„in(H«F^ AFJ) and 

Proof. (1) For the diagonal matrix Hq, define := diag[^Jr]^ , . . . , r]J^) ■ Let cr(A) 
denote the spectrum of a square matrix A, i.e., the collection of all eigenvalues of A. We 
thus have 

^ det (A7 - hV^F^AFJ-V^) = 

^ det{Ei/') . det . A' • _ p^AF^) • detl-V^) = 

^ det(A' • H-^ - F^AFJ) = 

det(A7 - H„Fc,AFJ) = 
^ A' G a(H„F„AFj). 

Since A is positive definite and Fa is row linearly independent, eH"^ FaAF^'^]/'^ is positive 
definite such that all the eigenvalues of H^FqAFJ are positive reals. By replacing A by the 
identity matrix, we see that the same holds for the eigenvalues of HqFcFJ". 

(2) By Statement (1), A„,in(H„F„AFj) = X^i,,{El/^ F^AF^eI/^) and A,nax(H«F„AFj) = 
Amax(Hy'F„AFjHy'). Further, for any x / 0, 

" xTEy-^FaF^E^^x ^ ■ 

Therefore, using the fact that all the eigenvalues of A are positive, we obtain 

< A fA)A r-V2^ i?T^l/2x 
— ^>maxV-''^/^maxV^Q ^ a a I 

Since Xvami^^d F^F^ ) = X^iYii^^aFaFa ) and Amax('^a'' Fq,F^ ) = Amax("a-^o-^a )' 

the desired inequalities follow. □ 

The following proposition attains uniform upper and lower bounds for the eigenvalues of 
HcfFaFj", regardless of Kn and a. 

Proposition 6.2. For any Kn and a, 

1/3 < XrainiEaFaF'^) < Amaxl^Q-Pa-f'J) < 1- 

Proof. (1) Uniform upper bound. By Lemma 6.2, we have Ydk=i I {■^aFa)jk\ = Ydk=i ^jk 
rjj. Hence, ^i=i{EaFaF^)jk = 1 for all j = I,. . . It follows from [18, Corollary 6.1.5] 
that Ai„ax(H„F„Fj) < 1. 
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(2) Uniform lower bound. To establish this bound, we exploit Gersgorin's Disc Theorem, 
say [18, Theorem 6.1.1]. Notice that each defined in Lemma 6.1 is the difference between 
the ith diagonal of G := F^F'^ and the deleted absolute row sum of the ith row of G. Hence, 
by Gersgorin's Disc Theorem, we see that Amin(HaFaFj) > minj(^j/?7j). Further, using the 
lower bound of E,i given in Lemma 6.1 and the equality for r]i given before Lemma 6.2, it 
is easy to verify £,i/r]i > 1/3 for all i, a and Kn- This yields the desired uniform lower 
bound. □ 

The next result establishes a uniform bound on the £oo-norm of F'i^ {FaKF'^)~^ Fa. 

Proposition 6.3. There exists Coo,p > (dependent on p only) such that for any Kn 
and a, ||FJ(F„AFJ)-1F„||oo < Coo,p. 

Proof. For any Kn and any a, let Hq be that defined in (6.1). Then 

\\f'^(f af'^)~-^f W —\\f'^-(" F AF"^)"^ ■ (" Fill 

< IIF'^II •III'- F \F'^)~^\\ -11- F 
— IK a Woo llV^a-' a-"--' a J Woo ||"a-' c 



aJ- a oo • 



oo 



It is easy to verify ||oo = 1. Furthermore, due to the definition of Hq, we have HHoFa | 
1. In what follows, we show that ||(HaFaAFj)~-^||oo is uniformly bounded using the banded 
structure of the matrix and other technical results developed before. This will give rise to 
a uniform bound. 

Let Fa be of £ rows. We consider two cases as follows: 

(i) £ > 2{p +1). In this case, by the structure of Fa shown in (5.6), we see via straight- 
forward computation that H := r.aFaAFa is a banded symmetric matrix with bandwidth 
p, i.e., {H)ij = whenever \i — j\ > p. It is known from [46, Lemma 6.2] that for a 
fixed spline degree p, there exist positive constants /i^ and JIp (dependent on p only) such 
that fi^ < Amin(A) < Amax(A) < Jlp for any Kn- It thus follows from Propositions 6.1 
and 6.2 that ||-ff||2 = Amax(-f^) < Mp, where Jtp is independent of Kn and a. Similarly, 
WH-'^h < l/Xmm{H) < 3/^p. Hence, for Fa with i > 2(p+ 1), it follows from [8, Theorem 
2.2] that there exists c' > (independent of Kn and a) such that ||(i/~^)i,||i < c' for all 
i = !,...,£, where {H~^)i, denotes the ith row of H^^ . In other words, ||i?~"'^||oo < c'. 

(ii) £ < 2{p + 1). For any Fa in this case, we introduce the block diagonal matrix H := 
diag{H, 1, . . . , 1) such that H' has 2{p -j- 1) rows. Hence, ii" is a banded symmetric matrix 
with bandwidth p and satisfies ||ii||2 < max(7Ip, 1), ||ii~^||2 < max(3/^^,l). Thus there 

exists c" > (independent of Kn and q) such that ||ii^-'^||oo < ||ii^"'^||oo < c" . 

Consequently, Coo,p := max(c',c") is the desired uniform bound with respect to the £oo- 
norm. □ 

Along with the above results, we finally complete the proof of the uniform Lipschitz 
property below. 

Proof of Theorem 3.1. The uniform bound on \\F^ {FaAF^)~^ Fa\\oo has been estab- 
lished in Proposition 6.3. The second statement follows directly from the continuous and 
piecewise linear property of b and polyhedral theory [13, Proposition 4.2.2]. □ 
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7. Proof of Theorem 3.2. We introduce some notation first. Let /'^l be the sphne 
estimator based on noise free data, i.e., = Ek=i ''hBPix), where 

(7.1) 6 := argmin-6^A6- 6^E(j/). 

Propositions 7.1 and 7.2 below give rise to uniform bounds for the bias and stochastic 
terms of estimation error in the sup-norm, respectively. 

Proposition 7.1. Ifl<r'<r, there exists a constant Cir' , which depends on r' only, 
such that 

(7.2) sup \\f^r')-f\\oo<C^r'-L-K~r 

/eC(r,L) 

In particular, if r = 2, then Cir' is independent of r' . 

Proof. Consider the case when 1 < r < 2 first. Hence \r' — 1] = 1. Let / be a piecewise 
linear function such that /(/tfc) = f{Kk)- For any x G [i^-k-i, Kk], k = 1, . . . , Kn, there exist 
Cx,ix G («^fc-i,Kfc) such that 

/» - fix) 

= /(Kfe-l) +Kn{f{Kk) - f{Kk-l)){x - Kfc-l) - [/(^fc-l) + f'{ix){x - Kfc-l)] 
= [nL) - fiix)] {X - Kk-l) < L\i, - e.lV - Kfc.il < LK- 



Thus 11/ - /Hoc < LK-\ Let / := (/(xi), . . . , /(x„))^, / := (/>i), . . . , /(x„))^, and let b 

be the optimal solution of (7.1) with E(y) replaced by X'^ f / (3n- Since / is a piecewise linear 
and convex function, we have b = (/(ki), . . . , /(k/^-^))"^. It follows from Theorem 3.1 that 

||/™-/||oo < \\b-b\\oo<^\\X^if-f)\\oo 

Pn 

< - /lloo = C^,lQ\\f - /Hoc, 

Pn 

where ||X^||oo = Er=i4''(^i) and g := E7=i B^2\xi) / Eti B^2\xi)'' . Therefore, letting 
Coo,iQ which is independent of r', we have 

ll/W - /lloo < (1 + c^,ig)\\f - /lloo < (1 + c^,iq)LK~'' = Cir'LK~\ 

Next, consider the case when r > 2. If r' < 2, a similar argument as above yields (7.2). If 
r' > 2, it is shown in Theorem 3.3 that an unconstrained estimator and the constrained 
one are asymptotically equivalent. Since (7.2) holds for the unconstrained estimator [46], 
the proof is complete. □ 

Proposition 7.2. There exists a positive constant C2r, which depends on r only, such 
that for any u > 0, 

(7.3) p(||/(,) - /"mIIoo > n) < (K„, +p)exp { - ^^-^-^^z^}. 

In particular, if r ^ [^^'A' then C2t is independent of r. 
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Proof. Recall p = [r — 1]. By Theorem 3.1 and (2.2), we have 

ll/(r) - /(r)lloo < sup |Cfc| = —7==\ SUp 

VPn k=l,...,K„+p V^/3.P ^ k=l,...,K,,+p 



where = SiLi-^l; (^«)^«/V^- Letting C2r := Coo,p / ^/Cp~p ^Nhicli is dependent on r only 
(but independent of r if r G [1, 2]), we have 

[K^ - 

ll/(r) -/(r)l|oo < CrC2ry— ^r, 

where = max/^^i^ x^^^ Hence, by using the implication: Z ~ -/V(0, 1) =^ P{Z > 
t) < ie-*'/^V i > 0, we have 

P(ll/„,-/HlU>..)<P(f.>j|^7J 



Let 



2r + 1 V n 



2 /logn 



It follows from Proposition 7.2 that, 

e(||/V) - /{r)lloo) ^ ^« + ^(ll/V) - /'Wlloo > t)dt 

< r„ + / {Kn+p) exp - 2 2 ^' 

= 0(T„). 

In view of Proposition 7.1 and the above result, we deduce that 



Ell/V) - /Hoc < ll/w - /Hoc +1EII/V) - /"mIIoo = 0(LKr + 

This shows Statement (1) of Theorem 3.2 by using the optimal choice of Kn- 

The next proposition establishes uniform bounds for the stochastic estimation error for 
a fixed point as well as mean squared error. 

Proposition 7.3. For any given xq G [0, 1], there exist two positive constants C^r <ind 
C^r, which depend only on r, such that 

(7.4) E(|/(,)(xo)-/(.)(xo)|') < Csr(T^n~^Kn, 

(7.5) E(|4)(xo)-/(,)(xo)h < C^r<T''n-'Kl 



□ 
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Furthermore, 

(7.6) E(||/(,) - /(,)||^) := E( / - < Ca.cr^n-iK^. 



/n particular, if r & [Ij^], t/ien Ca^ and C^r oltg independent of r. 

Proof. Recall that p = [r - 1], N{x) = [Bi{x), . . .,BK„+p{x)f £ M^"+p, and X = 
[N{xi),...,N{xn)f e M"x(-^n+p). Fix xo E [0,1]- Let /i := N{xo) G M-^"+?'. Note that h 
has at most p nonzero elements and each of these nonzero elements is positive whose sum is 
1. Let Ga be the coefficient matrix of a linear selection function corresponding to an index 
set a, i.e., Ga = F^{FaAF^)-^Fa. Hence, 



< • E E ^^-^l^^ < p • E E(^-)' 

i hj>0 hj>0 i 



,2 



where the first inequality in the third line is due to the symmetry of Ga and the following 
implication 



E K^")'y I — EK^"-'*-?'! — ( E — ii^«iIoo- 



As a result, in light of Theorem 3.1, we have 

(7.7) max||Ga/i||i <p-c^ . 

Let e = (ei, . . . ,en)^ be iid random variables with mean zero and variance one, and let 
z := (/(xi), . . . , f{xn)V. Thus y = X^{z + ae)//3„. Hence 

hHiy) = hHiX'^iz + ae)/M = -^/i^G,(,)X^(z + ae). 

Pn 

Furthermore, since 6(-) is a continuous piecewise linear function on R^^+p, so is boX'^ on M". 
It follows from the polyhedral theory that ho X'^ admits a conic subdivision of [13, 35], 
i.e., there exist a finite collection of polyhedral cones {Cj}^^^ and linear functions {g-'^'j^i 
such that (i) V}jC-j = 1^"; (h) each cone Cj has nonempty interior; (iii) the intersection of 
any two cones is a common proper face of both cones; and (iv) ho X^ coincides with on 
each Cj. For any given z' G M", let [z, z'] be a line segment joining z and z' . Starting from 
z^ we assume that the line segment [z, z'\ intersects some cones in the conic subdivision 
at zi, Z2, . . . , zi-\ G M", and ends at z'. Further, each subsegment of any two consecutive 
points, such as [z, zi], [zi, Z2], . . . , [z£-i, z% belongs to a single cone. Hence there exist [li G 
[0, 1], i = 1, . . . , ^ with £ < q, Yli=i = 1 ^i^d Gq- such that 

6(X^z') - 6(X^z) 
= G«,X^(zi - z) + G«,X^(z2 - zi) + • • • + G„,X^(^' - z,_i) 



[Y,^^iGa,)x^iz'-z), 



i=l 
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where /ij and G^. depend on z' for the fixed z. Since there are q cones in the conic sub- 
division, we may use the extended tuple {iJ'ijGaJ'^^i (corresponding to z') to characterize 
b{X'^z') — b{X'^z), by setting some fii = 0, without loss of generality. Note that if z' is a 
random variable, so is (/^i, G^J^^-^. 

Using y = X'^{z + ae)//3n, we have, for the given vector h, 

E{\hniy) - hnm))\^) 

= ^^i^,,G^^r^(H\hniX^{z + ae))-hn^^^ 

Pn 

Moreover, for a fixed tuple {fJ'i, Ga^) f^i, 

E{\hn{X^{z + ae)) - /i^6(X^z)|2 | (/i„G,Jti^ 

= a^E(\ j^if^^h^G^jX^e • ^(^,/i^G,jX^6| | {^i^,G^X=l) 

i=l i=l 

= a^E(\ j2if^,h^G^^X^)e • e^(X; /.,XG«^,/i)| | {fii,G^JU) 
i=i j=i 

= a'\iY,^^^h^G^^X^){Y,H^Ga,h) 
i=i j=i 

< ^^(^/..llXG^^/ilb) • (^^,||XG«^,/i||2) 

i=i j=i 

< a2(max||XG„^/i||2)' < (max||G„,/i||2)' 

i i 

where the last inequality is due to ||X||2 < /3nAmax(A) and (7.7). Therefore, 
E{\hniy) - /i^6(E(y)|2) < J- • A^ax(A) • p ■ 4,^ • 

Pn 

Observing that Amax(A-) is uniformly bounded [46] and the uniform bound of /?„ in (2.2), 
we obtain (7.4) for p = \r — 1\. 

The above argument can be extended to prove (7.6). Indeed, let h{x) := N{x). Thus 
f{r){x) - f{r){x) = h^{x)[b{y) - b{E{y))] such that 

H\\hr)-hr)\\D 

TW lial I \-uT t \tt/vT/ \\ t/ vT \i|2 



h'n 



/ \h^{^)m'^{z + cJe))-b{X^z)]\^dx\{^^,,G^^)U)), 



where, for a given tuple {ni,G. ^'^ 



aWi=l) 



e(^' \h^{x)[b{X^{z + ae)) - b{X^z)]fdx \ (/U^G.J^,) 



< a^Xg [\max\\G^M^)hfdx < a'\\X\\lip ■ c^^^), 
Jo « 
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which yields (7.6). 

To show (7.5), we consider 

n\h^Ky)-hn{ny)\'') 

For a fixed tuple {m, G^J^^p let v := Ya=i f^i^Ga^h £ M" and E(e^) = 3, i = 1, . . . , n. We 
thus have 

E{\hn{X^{z + ae)) - hHiX^ z)\^ \ (/i,,G„Jti) 

<? q 
= E(|(^^,/i^G,,X^)ae- (J^M,/i^G„,X^)aep| (/ii,G,J^^i 

i=l 1=1 

= E(\v''ae-ae^v\^\{fii,GaJUi 

n n 
1=1 iJ = l^ijLj 



n 



i=l i=l j = l 



2 2 



i=i j=i 
< 3a^- (/3„Amax(A) -p -0^,^)2, 

where the last inequality is due to ||f Hi < /3„Amax(A)pc^ p. This shows that 
E{\hn{y) - hn{E{yt) < ^ ■ ■ (A„,ax(A) • p ■ cl^^f. 

Pn 

Using the uniform bounds on Xmaxi-^) and /3„ again, we obtain (7.5). □ 

Propositions 7.1 and 7.3 imply that, for any xq S [0, 1], 
(7.8) sup E|4)(xo) - /(xo)P < G^L^K^^r + Csra^Knn-\ 

This shows Statement (2) of Theorem 3.2 by using the optimal choice of Kn- 

8. Proof of Theorem 3.4. Throughout this section, we shall use G^ or with G N 
to denote positive constants that depend only on L (and a). We introduce some lemmas. 
The first lemma, as a complement to (3.4), provides a bound for the stochastic term of 
estimation error in the sup- norm. 

Lemma 8.1. There exists a constant G2 > such that 

(8-1) - /hI|o„/{||/(,,, - /(,,||o„ > «}) 
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Proof. Direct calculation yields that 

^[\\f{r) - /(r)lloo/{||/(r) " /(r)l|oo > u]^ 

/OO /"OO 
P{\\hr)-flr)U>t)dt< J (g,. + l)exp{- ^ "^^^ t^jdt 

<\/|c..v^(A-„ + l)exp{-^£|-j}. 

This completes the proof. □ 

It is shown next that it is highly improbable that the estimated f is strictly smaller than 
the true r. We say a few words about notation. Recall that Tn = [(logn)2] and define the 
set TZ := {rj | = 1 + j'/r^, j = 0, 1, . . . , Tn}. 

Lemma 8.2. Let r,d G [1,2] with d <r. There exists C3 > such that 

sup P{f = d) < CsTn.n'^. 

f&C„{T,L) 

Proof. By the definition of f given before Theorem 3.4, 

sup P{f = d)< sup Poo{r\d)<Tn max sup Pooir',d), 

where 

Poo{r',d) := P(^\\f(r") -/(r')lloo > — V'(t-'))- 

Here ip(r') is defined in (3.6) and r" := min{r £ Tl\r > d}, i.e., r" G 7^ is closest to d from 
above. Hence, r" > d > r'. In view of (3.3) and (3.4), 

ll/(r") - /(r')lloo < ||/(r") " /{r")lloo + ||/(r') " /{r')lloo + ||/(-r") " /lloo + ||/(r') " /lloo 
< ll/(r') - /(r')lloo + ||/(r") " /{r")lloo + C'l-^-f^(~/) + CiLK^J,,,^ 
= CiLK^Ji^ (1 +LOriy) + ||/(r') - /(r')lloo + ||/(r") ~ /(r")lloo' 

where 



K"^' T (r"-r') , 1 

I I (^') / /l0gn\ (2r." + l)(2r'+l) . / log ^ ^^ 25^^ 



ir") 



for a positive constant ci which is bounded away from zero and above. Since {n ^ log n)^^°s»^ _ 
as n — )■ OO, a;^/^^// converges to zero uniformly for all r',r" E TZ with the given r^. Let 

Poo{r',d) < pi,oo +P2,oo, where 

Pl,oo := -P|||/(r') - /(r')lloo > ^ ^{r')(l " o(l))|, 

p/llf f- II ^ (2^'' + 1) / \ 

P2,oo := -r I II J{r") - /{r")lloo > ^ ^{r")J- 
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By using (3.4), (3.6) and the orders of K(^r')j -f^(r"); obtain two positive constants C2 and 
C3 such that 

, , 2 _ 1 1 1 . ,^ 

Pi,oo < {K{r') + !)•"- < C2n (2r'+i) (logn) < 2d+i(logn) ^z", 

P2,oo < {K(r") + 1) • < C3n"(2'-''+i)(logn)"^^ < csn'adTi (log n)"^/^ 

Combining the above results, we see that the lemma holds. □ 
The following lemma develops a uniform bound on the sup-norm risk of for r G [1, 2]. 
Lemma 8.3. There exist positive constants C4 and C5 such that 

sup sup P/jv^-jll/V) -/Hoc > l + I < C4n-\ 

re[l,2] feCHir,L) ^ ^ ' ^ 

sup sup EU-m,)-f\\l) < C5. 

r6[l,2] feCH{r,L) ^ ^ ^ ^ 

Proof. Since 

||/(r) - /lloo < ||/(r) - /(r)l|oo + ||/(r) " /lloo < ||/(r) " /(r)l|oo + C*! Li^^"^ , 

we have via (3.6) and CiLK'^J-^ = ip{r)/'^ that, for any n > 1 + \/2 , 

sup Pjvrill/V) - /Hoc > n| < - /"mIIoo > (n - ^)V-'(.) 



/6CH(r,L) 



< {K^r) + l)e- 



- logn' 



2r+l 



In view of (3.5), = (7(r)(n/ log n)^/^^'"^"'^^ for some function q{-) that is positive and 
continuous on [1,2]. Hence, a constant > exists such that K(^j.^ < q*(n/ log n)^/^^'''^-^) 
for all r G [1,2]. Applying this to u = 1 + 1/2, we obtain a positive constant C4 such that 
for any r G [1, 2], 

{1 - 4^2+8 1 -, 

V'mII/m - /lloo > l + \/2 <C4n-^^(logn)-W < C^n-\ 

Using the above results, we further get a positive constant C5 such that 

e(V(;;ii/V)-/IIL) <(i + ^^)+ r p{i^ir]\\hr)-f\\lo>i)dt 



(1+V2) 

(2*1/2 _ 1)2 

-n-M -Mjexp <i - ■ 

(I+V2) 

< (1 + V2) + C5 • n"^ < (1 + V2) + C5 =: C5. 



<(1 + V2)+/ (ir(,) + l)exp - ^ / logn rf* 

J(i+V2) ^ 2r + 1 J 



This completes the proof. □ 
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We are ready to prove the theorem. Observe that 

sup - /luj <R^ + R+, 

wherei?- := supjgCH(r,L) ||/(f)-/||oo <?"}}, and i?+ := supjgCH(r,L) lEjV'^,) 

/lloo I{'f^ > t}|. Hence, it suffices to show that 

(8.2) hmsup sup = 0, 

n-!>oo rG[l,2] 

(8.3) hmsup sup R~^ < oo. 

n-s>oo rG[l,2] 

We first prove (8.2). By the definition of f given before Theorem 3.4, 

R-< Y] sup EU-}\\f(^,)-f\\^I{f = r'})<pi+p2, 

where, in view of (3.3) and CiLK'^J^-^ = ijj(^^i'^/2 (for p2 below), 

■= V sup P{r = r')ip7^. ( JV'(r') + ^^{r')) > 

r>r'e7e/eCH{r,L) ^ ' 

P2 ■■= ^ %^ + 11/(^0 ~ /(r')lloo ^{||/(r') " /(r')lloo > ^V'(r')} 

r>r'67e 

We win prove sup^gj]^ 2] Pj — as n — )• oo, j = 1,2. Since r' < r, it is easy to see that 
there exists a positive constant cg such that 

ip{r') /logn\ (2r+'i){2r'+i) . /logn\ ipfTT) . 

< C6 < C6 < C6. 

\ n J \ n J 

It thus follows from Lemma 8.2 that, as n — ?• oo, 

^ (1 + \/2)c6 / ^ (1 + \/2)c6 , ^ 

Further, from (3.4), we deduce the existence of a constant cy > such that 

\/2 

r>r'e7^ 

^(r'l fi, - - ,, 1 -1 

2^ / ^1 II Ar-') - /(r-')lloo > ^ < • C6 • C7 • n^^. 

Also, it follows from Lemma 8.1 that a constant cg > exists such that for all large n, 

X] V'(";.}lE(||/(r') - /{r')lloo-^{||/(r') - /(r')lloo > ^V'(r')}^ 
r>r'e'R. 

= ^(;Jc8(logn)2(2'-'+i) -n"^ < T„V(",)C8n-i 

r>r'e7^ 
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By virtue of the above results, we have /92 — > as n — )• cx). This yields (8.2). 

We now prove (8.3). Consider the random event i\{r,r') := {'^(7) ll/(r') ~ /lloo > 1 + 
\/2, r' G 7^ } . Then 

R^< sup V E(v^-;||/V)-/||oo/{r = r'}) 



< (1 + V2) sup P{r > r} 

.feC„{r,L) 

+ KUps\\fir')-f\\ooI{{f = r'}n^r,r')} 

<(1 + V2)+ Yl ( s^^P (lE(V'(;;il/V')-/llL))'^' «^P Pfir^r')], 

where pf{r,r') := P({f = r'} D H(r, r')) for r' G 7^. Let r* := min{r' & TZ : r' > r}. Hence, 
r* — r G [0, l/xn). In view of (3.6), we have ■0(s) = J'(s) • (n~^ log n)*/^^*"''"'^) for a function 
p{-) that is positive and continuous on [1,2]. Let p^, := min^gjx 2] ^'(•5) > 0. This shows that 

Vv.)=p(r.)(jfi)'^< [l + ^''--'^7''-' ]p(r)(i^)'^= [l + o(l)]V.,,,. 

Hence, if f = r' > r, then, by the definition of f G 7^, — /(r,)||oo < ^(1 + \/2)V'r, < 

i(l + \/2 + o(l))V'(r)- Therefore, 

i^ir)\\f{r') - /Hoc < • (ll/(r') " /(r.)l|oo + ||/V.) " /lU) 

< 1(1+ v2+o(i)) • [V'(7,;)ii/V,) - /Hoc] 

< 1 (1 + V2 + 0(1)) + [1 + 0(1)] . i^-]^ 114.) - /Hoc. 
It follows from Lemma 8.3 that for all large n, 

^I^^M 114') - fWlo} < 2[4 + (1 + 0(1)) • E{i>~l^ 114.) - fWD] < 2(4 + 2C5). 
We consider Pf{r, r') next. By Lemma 8.3, we have, when r' = r, 

sup Pf{r,r) < sup P{'i^{r,r)} < C^n~^ . 
feCHir,L) feCH{r,L) 

Now consider pf{r,r') with r' > r. Since f(s)i') is a continuous function for any s, there 
exists a nonrandom point G [0,1] such that [/(,•')(**) — /(r,)(**)l = ll/{r') ~ /(r,)lloo- Let 
:= (/(r.)(**) - /(r.)(**)) - (/(r')(**) " /(r')(**)) Therefore, if r = r' > r, then 

ll/{r') - /(r.)lloo < |/{r')(**) " /{r,)(**)| + < ll/{r') " /(r.)||oo + 

< 1(1 + V2 + 0(1)) V'(.) + 1^*1 . 
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Using this result and ||/(r,) — /||oo < V'(r,)/2, we have 

ll/(r') - /lloo < ||/(r') - /(r*)||oo + \\f(r') " /(r')l|oo + ||/(r.) " /||oo 

< ^(1 + V2 + 0(1))V(.) + + II /V) - /(r')ll°o + ^-^Y^^{r)- 

As a result, for r < r' G 7^, we further deduce via Markov Inequality, 
Pf{r,r') < P{\e\ + ll/V') - /(rOlloo > 

< ^'(ll/V.') - /Colloo > ^V'C..)) +^^(1^*1 > ^"^"^^^"^^V m) 

< P(||/(,') - /(.Olloo > ^V'Cr.)) + 10' • i'l.lym.l^)- 



It follows from (3.4) and r* < r' < 2 that there exist constants cg > and a > (indepen- 
dent of r G [1,2]) such that r' > =^ K^j.^)/ K^j./-^ > a(n/ log n)^/^^^^") > 1 for all large n 
and that 

f(r') - f(r') lU > ^V'Cr,) < ^9 h— - • exp - — — - • — — log n 



1 



Using Proposition 7.3, we also obtain a constant cio > such that 

E(|e*P) < 2[E(|/V.)(t.) - /(,.)(t*)P) +E(|/(.')(i*) - /(r-')(**)l') 

< Cio < ^ • Clo . 

n n 

Noting that V'^r*) — '^(-f^(r*)/f^) log for a positive constant 5 independent of r,,, we have 
?/;^'^ • E(|^*p) < 2cio(5 • logn) ^. Hence, there exists a constant cio > (independent of 
r,r' G [1, 2]) such that for all n sufficiently large, 

p)^\r,r')<cio-{log7iy^^^ =^ R+ < (1 + ^2) + ^2(4 + 2C5) • ^lo- 

This leads to (8.3), and thus completes the proof of Theorem (3.4). 

9. Proof of Theorem 3.5. We establish the following lemma to be used for the anal- 
ysis of the risk of /(xq). 

Lemma 9.1. Suppose that f is convex and dijjerentiable on [0,1]. Then there exists a 
positive constant Cq independent of f such that for each j, 

(9.1) E(|/;(xo)-/(xo)p/,) < CQ2^n~Za\ 

Proof. Recall that Cfc = - 1)M„ + ^^^^), V A; = 1, . . . , Kn,j Let / and / be two 
piecewise linear functions such that /(Cfc) = ^{Vk.j) and /(Cfc) = fiCk), respectively. Note 
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that if Mn is odd, then is a design point so that K{yk) — /(Cfc) = 0. Otherwise, direct 
calculation yields that 



1 r 

E(yfc) - /(Cfc) = jfYl [/(^(fc-i)A/„+j) - /(Cfc) 

M„/2 



^ Yl [(/(^(fc-l)M„+M„-i+l) - /(Cfc)) - (/(Cfc) - /(x(fc_i)M„+i)) 

/(a;(fc-i)M„+A//„-i+i) - /(Cfc) /(Cfc) - /(a;(fc_i)Af„+j) 



M„/2 jv/„ • 



n 



i^-j)/n {^-j)/n 
Since / is a convex function, we have E(i/fc) — /(Cfc) > and 

"/(Cfc+i)-/(Cfc) /(Cfc) - /(Cfc^-i) 



^fe) - /(Cfc) < E ^ 



^A2/(Cfc^+i). 



Hence, for any xq S (Cd„,Cd„+i], 
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< /(xo) - /(xo) < -max{AV(Cd„+i), A2/(Cd„+2)} < AV(Cd.+i) + A2/(Cd„+2) 
< A/(Cd„+2)-A/(CdJ. 
Further, since / is convex, /(xq) - /(Cd„) > fiCdJi^Q " Cd„) such that 

< /(xo) - /(^o) = /(CdJ + i^n(/(Cd„+i) - /(CdJ)(xo - CdJ - f{xo) 

< [/(Cd„+i) - fiCdJ - K~'f'iCd,J]Kn{xo - CdJ 

< [/(Cd„+i) - fiCdJ] - [fiCdJ - /(Cd„-i)] 

< A/(Cd„+2)-A/(CrfJ. 

Let Tj := E(Ayrf„+4j - Ayd^„2j)- Hence, < /(xq) - /(xq) < 2tj. 

Notice that for xq G (Cd„;Cd„+i]; there exists ^ G (0,1] such that fj{xo) = fJ-f{Cd„) + 
(1 - /^)/(Cdn+i) and /(xo) = l^fiCdJ + (1 - /")/(Cd„+i). where / and / are the piecewise 
constant splines (i.e., p = 0) corresponding to the convex constrained least squares for (ykj) 
and (E(yfcj)), respectively. It follows from Proposition 7.3 that a positive constant cn exists 
such that 



(9.2) 



fiCdn+lW 



E[(/,-(xo)-/(xo))^/,] < 2(/.^E|/(CdJ-/(Cdjr + (l-/^)'E|/(Cd„+i 
< 2[fi + (1 - fi)]C3a^Kn,j ■ < cua^2^n-'^/^ . 

This implies 

mlji^o) - f{xo)fl,\ < 2(e[(/,-(xo) - /(xo))^/,-] +E[(^-(xo) - f{xo)?Ij] 
< 8r|E(/,) + 2E[(/;-(xo) - /(xq))'] < 8t|E(/,-) + ciia22J^+in-4/^ 
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If Tj < IXpl'^'^^-nr'^l^a ^ then (9.1) holds. We thus only consider the case when Tj > 
2A2-'/^"'"^n"^/^cj. In this case, note that Aydn+4:,j — ^yd„-2,j has a normal distribution with 
mean tj and variance 2^~^'^n~'^^^a^ . Consequently, 
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<P\Z<X- -4 <P\Z< ( < exp ' ^ 



where Z is a standard normal random variable. In view of sup^^Q z exp^—^^) = 2p~^e~^, 
we obtain r|E(/j) < 2e-'^2^+'^n-^^^a'^ . Hence (9.1) holds. □ 

With this lemma, we show as follows: 

Proof of Theorem 3.5. RecaU that Knj := 2^n'^/^. If / G CH{r,L) with r G [1,2], 
then Tj = A/(Cd„+2) - A/(CdJ < SLA^J = 2L{2~^ n'^/^Y . Let J be the smallest natural 
number (dependent on r) such that 
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(9.3) 2-^n 5>L2'- + lfJ 2r + l,^ 2r + l. 

If j > J, then 2^'^ > L2'^/(2'-+i)o-2r/(2r+i)^(4r-2r2)/(i0r+5)_ Hence, this shows via (9.3) that, 
for j > J, 

(9.4) Tj < 2L • 2"^'' • n"i < 2L2^o-2^n~27TT < 2 • 2^n"§o-. 

Recall that P(Z > A) < 1/4 for the positive A. Then there exists a 5 > such that 
7 := P{Z > X - 5) < 1/4. Note that < 1. Given this S, choose K e N (independent 
of r) such that 1 < 6 ■ 2^/"^. Since only one Ij / and l| = Ij, |/(xo) - f{xQ)\^ = 
Yl'jLoifji^o) ~ /(^o))^-^j such that the risk of / is decomposed into two terms: 

J+K oo 

(9.5) E\f{xo) - /(xo)|2 = mljixo) - f{xo)?Ij] + E mljixo) - f{xo)flj]. 

j=0 j=J+K+l 
We consider the first sum in (9.5). It follows from Lemma 9.1 that 
J+K ^ J+K 

j=0 j=Q 

Since J is the smallest integer satisfying (9.3), we have 

(9.6) 2'^n"5 < 2 • L2^cj"2^n~2lTT. 
Therefore, 

J+K 



(9.7) Y ^[(/(^o) - f{xo)flj] < C762^+2^5;^cTWn-5^. 

j=0 
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Consider the second sum in (9.5). We show two technical results. Firstly, since the design 
points corresponding to Ayd„+4,i - ^yd„-2,j are disjoint for different j, Ayd,^+4j - Ayd„-2,j 
are independent. Hence, for j > J + K, 

Wj)< n ^(Ayd„+4,i-Ayrf„_2,i>A2t+in-ta) 

i=J+K 

Secondly, in view of the argument for (9.2) and Proposition 7.3, we have /i G [0, 1] and a 
constant ci2 > such that 

K\fj{xo)-f{^ot 

< 8(/E|/(a) - fiCajf + (1 - /i)%|/(a+i) - /(a+i)r) 

Using these results and 7 G (0, 1/4), we obtain a constant C13 > such that 

00 

e((/(xo)-/(xo))2/,) 

j=J+K+l 

< Yl (iE|7(xo)-/(xo)|'-IE(/|)) < E ci22%-ta2.7^'^ 

j=J+K+l j=J+K+l 

00 

< ci2(47)(^-^)/^-^/22-^n-ta2 < ci^l'n-la^ 

j=J+K+l 

Furthermore, since / is convex, Tj = E,{Ayd„+4j — Ayd„-2,j) is a decreasing function of j. 
Therefore, in view of < /(xq) — f{xo) < 2Tj and (9.4), 

00 00 

Y e((/(xo) - /(xo))'/,) < 4rJ • E(/,) < 4rJ < 16 • 2'n-la\ 

j=J+K+l j=J+K+l 

By virtue of the above results and (9.6), we obtain C14 > such that 

00 

(9.8) Yl mlixo)- f{xo)flj] < cuL^a^n-^. 

j=.J+K+l 

Hence, the theorem follows by combining (9.7) and (9.8). □ 

10. Proof of Theorem 3.6. Recah that fy := {fP\xi), /'^H^n))'^ with coefficient 
matrices Aa defined in (3.10) and / := (/(xi), . . . , f{xn))^. 

Lemma 10.1. Fix a spline degree p. The following hold: 
(1) For any index set Q, < trace(AQ,) < Coo,p{Kn +p); 
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(2) E[{y-fjy-f)] = a^E [trace < Coo,p {Kn + p) 

Proof. (1) Since is symmetric positive semidefinite, its trace is nonnegative. For 
a given a, recall that Ga := FJ(F„AFJ)~1F„. Hence Aa = XGaX'^//3n. By virtue of 
1 1 I loo < Coo,p from Theorem 3.1 and |(A)jj| < 1 for all i,j (by the definition of we 

have, for any a, < trace(^Q) = trace^G^ • '^^'^ ^ = trace(GaA) < Coo,p{Kn + p)- 

(2) Since fy : — )• M" is continuous and piecewise linear, it admits a conic subdivision of 
M" [13, 35], i.e., there exist a finite collection of polyhedral cones {Cj}j^^ and linear functions 
{9'^}j=i satisfying the similar conditions as specified in the proof of Proposition 7.3. In 
particular, each cone Cj has nonempty interior and fy coincides with on each Cj. Clearly, 
= ^aV for some index set a. In this case, we write the cone Cj as Ca- Let int(Co!) 
denote the interior of Ca- Obviously, fy is differentiable on int(Cct). Indeed, the (Frechet- 
) derivative of fy is Aa for any y € int(CQ). Let h{y) := fy — f . Since \ (IJ^ int(Cj)) has 
zero measure, h is almost differentiable on M" in the sense of [39, Definition 1]. Let (p^z) be 
the standard normal density on M" with variance a^. We have 



E 



^ JlJ.intfc) 5y 



dy 



E 



/ ||74Q,||(;^)(z)dz < max ||Aq,|| < oo. 

iz+/eint(CQ) ° 



Letting Z := a{ei enf , we have E[(y - /, fy - /)] = E[{Z, h{y))] = trace(E[Z • h^iy)]) . 
By the above results and Stein's Lemma [39, Lemma 2], 

trace(E[Z- /i'^(y)]) 

= o-^tracefEr^(y)l') = cr^ / trace (z + /)V(z)'^z 

^ ^dy \) Ju,int(C,) ^oy J 

= a^Yl /'^^Kl^^^ + ^~'^)^^^^ ■ ^{z|z+/eint(C„)}^^ 

= a^Yl Arac«(^")'^(^) -^{zlz+zeintic)}^^ = f^^E(trace(^„(y))) . 

Statement (2) thus follows from (1). □ 

Equipped with the above lemma, we have the proof of Theorem 3.6 below. 

Proof of Theorem 3.6. The MLE of cr^ is = ||y - fyWl/n. Let Rn := ||y - f\\l - 
\\y ~ /ylll such that o"^ = ||y — /Hi/''^ — Rn/n. Since 

Rn = \\y - fWl - \\y - fyWl = 2(y - /, /, - / ) - ||/, - f\\l 

we have, by Lemma 10.1 and (7.8), 

\nRn)\ < 2(72E(trace(yl„(j^))) + E||/^ - /||i 



32 



where p = \r — 1\. This shows that |E(i2n)| = 0{Kn + nK~'^^). Hence, we deduce that (i) 
if Kn = o{n) with Kn — )• oo as n — )• oo, then cj^ — )• cj^ in probabihty; (ii) if = o{^/n) 
with Kn — ?• oo as n — 7- oo, then y/n{a'^ ~ '^^) is asymptoticahy normal with mean zero and 
variance 2o"^; and (iii) if is of order n^^'+i , then |E((T^ — (T^)| is of order . □ 
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